AITopics | semi-supervised training

Collaborating Authors

semi-supervised training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech

Liu, Jiaxuan, Xiang, Yang, Zhao, Han, Li, Xiangang, Gao, Yingying, Zhang, Shilei, Ling, Zhenhua

arXiv.org Artificial IntelligenceSep-26-2025

Recent large language models (LLMs) have made great progress in the field of text-to-speech (TTS), but they still face major challenges in synthesizing fine-grained emotional speech in an interpretable manner. Traditional methods rely on discrete emotion labels to control emotion categories and intensities, which cannot capture the complexity and continuity of human emotional perception and expression. The lack of large-scale emotional speech datasets with balanced emotion distributions and fine-grained emotional annotations often causes overfitting in synthesis models and impedes effective emotion control. To address these issues, we propose UDDETTS, a universal LLM framework unifying discrete and dimensional emotions for controllable emotional TTS. This model introduces the interpretable Arousal-Dominance-V alence (ADV) space for dimensional emotion description and supports emotion control driven by either discrete emotion labels or nonlinearly quantified ADV values. Furthermore, a semi-supervised training strategy is designed to comprehensively utilize diverse speech datasets with different types of emotional annotations to train the UDDETTS. Experiments show that UDDETTS achieves linear emotion control along three interpretable dimensions, and exhibits superior end-to-end emotional speech synthesis capabilities. Code and demos are available at: https://anonymous.4open.science/ It is designed for large-scale emotional speech datasets and integrates discrete label and dimensional ADV annotations to enable controllable emotional TTS. These models leverage the strong language understanding of LLMs to generate speech semantic tokens from text tokens, thereby achieving significant advantages in synthesizing expressive speech. Current LLM-based methods primarily rely on emotion prompts for supervised fine-tuning.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.10599

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Info-Coevolution: An Efficient Framework for Data Model Coevolution

Qin, Ziheng, Xu, Hailun, Yew, Wei Chee, Jia, Qi, Luo, Yang, Sarkar, Kanchan, Guan, Danhui, Wang, Kai, You, Yang

arXiv.org Artificial IntelligenceJun-23-2025

Machine learning relies heavily on data, yet the continuous growth of real-world data poses challenges for efficient dataset construction and training. A fundamental yet unsolved question is: given our current model and data, does a new data (sample/batch) need annotation/learning? Conventional approaches retain all available data, leading to non-optimal data and training efficiency. Active learning aims to reduce data redundancy by selecting a subset of samples to annotate, while it increases pipeline complexity and introduces bias. In this work, we propose Info-Coevolution, a novel framework that efficiently enables models and data to coevolve through online selective annotation with no bias. Leveraging task-specific models (and open-source models), it selectively annotates and integrates online and web data to improve datasets efficiently. For real-world datasets like ImageNet-1K, Info-Coevolution reduces annotation and training costs by 32\% without performance loss. It is able to automatically give the saving ratio without tuning the ratio. It can further reduce the annotation ratio to 50\% with semi-supervised learning. We also explore retrieval-based dataset enhancement using unlabeled open-source data. Code is available at https://github.com/NUS-HPC-AI-Lab/Info-Coevolution/.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.0807

Country:

North America (0.28)
Asia > Singapore (0.14)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic

Klejch, Ondřej, Lamb, William, Bell, Peter

arXiv.org Artificial IntelligenceJun-6-2025

An effective approach to the development of ASR systems for low-resource languages is to fine-tune an existing multilingual end-to-end model. When the original model has been trained on large quantities of data from many languages, fine-tuning can be effective with limited training data, even when the language in question was not present in the original training data. The fine-tuning approach has been encouraged by the availability of public-domain E2E models and is widely believed to lead to state-of-the-art results. This paper, however, challenges that belief. We show that an approach combining hybrid HMMs with self-supervised models can yield substantially better performance with limited training data. This combination allows better utilisation of all available speech and text data through continued self-supervised pre-training and semi-supervised training. We benchmark our approach on Scottish Gaelic, achieving WER reductions of 32% relative over our best fine-tuned Whisper model.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.04915

Country: Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Entriever: Energy-based Retriever for Knowledge-Grounded Dialog Systems

Cai, Yucheng, Li, Ke, Huang, Yi, Feng, Junlan, Ou, Zhijian

arXiv.org Artificial IntelligenceJun-3-2025

A retriever, which retrieves relevant knowledge pieces from a knowledge base given a context, is an important component in many natural language processing (NLP) tasks. Retrievers have been introduced in knowledge-grounded dialog systems to improve knowledge acquisition. In knowledge-grounded dialog systems, when conditioning on a given context, there may be multiple relevant and correlated knowledge pieces. However, knowledge pieces are usually assumed to be conditionally independent in current retriever models. To address this issue, we propose Entriever, an energy-based retriever. Entriever directly models the candidate retrieval results as a whole instead of modeling the knowledge pieces separately, with the relevance score defined by an energy function. We explore various architectures of energy functions and different training methods for Entriever, and show that Entriever substantially outperforms the strong cross-encoder baseline in knowledge retrieval tasks. Furthermore, we show that in semi-supervised training of knowledge-grounded dialog systems, Entriever enables effective scoring of retrieved knowledge pieces and significantly improves end-to-end performance of dialog systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.00585

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Confidence-Guided Data Augmentation for Improved Semi-Supervised Training

Khmaissia, Fadoua, Frigui, Hichem

arXiv.org Artificial IntelligenceFeb-21-2023

We propose a new strategy to improve the accuracy and robustness of image classification. First, we train a baseline CNN model. Then, we identify challenging regions in the feature space by identifying all misclassified samples, and correctly classified samples with low confidence values. These samples are then used to train a Variational AutoEncoder (VAE). Next, the VAE is used to generate synthetic images. Finally, the generated synthetic images are used in conjunction with the original labeled images to train a new model in a semi-supervised fashion. Empirical results on benchmark datasets such as STL10 and CIFAR-100 show that the synthetically generated samples can further diversify the training data, leading to improvement in image classification in comparison with the fully supervised baseline approaches using only the available data.

artificial intelligence, augmentation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.08174

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Semi-supervised 3D Object Detection via Temporal Graph Neural Networks

Wang, Jianren, Gang, Haiming, Ancha, Siddarth, Chen, Yi-Ting, Held, David

arXiv.org Artificial IntelligenceJan-31-2022

3D object detection plays an important role in autonomous driving and other robotics applications. However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging large amounts of unlabeled point cloud videos by semi-supervised learning of 3D object detectors via temporal graph neural networks. Our insight is that temporal smoothing can create more accurate detection results on unlabeled data, and these smoothed detections can then be used to retrain the detector. We learn to perform this temporal reasoning with a graph neural network, where edges represent the relationship between candidate detections in different time frames. After semi-supervised learning, our method achieves state-of-the-art detection performance on the challenging nuScenes and H3D benchmarks, compared to baselines trained on the same amount of labeled data. Project and code are released at https://www.jianrenw.com/SOD-TGNN/.

detection, detector, student, (16 more...)

arXiv.org Artificial Intelligence

2202.00182

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cost-effective speech-to-text with weakly- and semi-supervised training

AIHubJun-28-2021, 09:57:34 GMT

Voice assistants equipped with speech-to-text technology have seen a major boost in performance and usage, thanks to the new powerful machine learning methods based on deep neural networks. These methods follow a supervised learning approach, requiring large amounts of paired speech-text data to train the best performing speech-to-text transcription models. After collecting large amounts of relevant and diverse spoken utterances, the complex and intensive task of annotating and labelling of the collected speech data awaits. To get a feel for a typical scenario, let's look at some estimates. On average a typical user query, for example "Do you have the Christmas edition with Santa?", would last for about 3 seconds.

semi-supervised training, speech data, training method, (13 more...)

AIHub

AI-Alerts: 2021 > 2021-06 > AAAI AI-Alert for Jun 29, 2021 (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Autoencoder-based cleaning in probabilistic databases

Mauritz, R. R., Nijweide, F. P. J., Goseling, J., van Keulen, M.

arXiv.org Artificial IntelligenceJun-17-2021

In the field of data integration, data quality problems are often encountered when extracting, combining, and merging data. The probabilistic data integration approach represents information about such problems as uncertainties in a probabilistic database. In this paper, we propose a data-cleaning autoencoder capable of near-automatic data quality improvement. It learns the structure and dependencies in the data to identify and correct doubtful values. A theoretical framework is provided, and experiments show that it can remove significant amounts of noise from categorical and numeric probabilistic data. Our method does not require clean data. We do, however, show that manually cleaning a small fraction of the data significantly improves performance. I. Introduction Data quality problems are a large threat in data science. Specifically, in the field of dataintegration, i.e., combining several data sources into a single and unified view [1], often uncertainties are encountered when extracting, combining, and merging data. These uncertainties can result from the nature of data (such as noise in measurements), but can also be a result of the integration process itself. Information about these uncertainties is considered an important result of the integration process [2]. In probabilistic data integration (PDI), the integration result as well as information on uncertainty is stored in a probabilistic database (PDB) [3]. The PDB maintains possible alternatives for values and records, their likelihoods, and the dependencies among them. The PDI process (see Figure 1) consists of two main phases. This phase is followed by the improvement phase, which gathers evidence while the data is being used for the purpose of gradually improving its quality.

database, experiment, noise, (17 more...)

arXiv.org Artificial Intelligence

2106.09764

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Denmark (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Twin Neural Network Regression is a Semi-Supervised Regression Algorithm

Wetzel, Sebastian J., Melko, Roger G., Tamblyn, Isaac

arXiv.org Artificial IntelligenceJun-10-2021

Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values of two different data points rather than the targets themselves. By ensembling predicted differences between the targets of an unseen data point and all training data points, it is possible to obtain a very accurate prediction for the original regression problem. Since any loop of predicted differences should sum to zero, loops can be supplied to the training data, even if the data points themselves within loops are unlabelled. Semi-supervised training improves TNNR performance, which is already state of the art, significantly.

artificial intelligence, machine learning, tnnr, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/2632-2153/ac9885

2106.06124

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

Biswas, A., de Wet, F., van der Westhuizen, E., Niesler, T. R.

arXiv.org Machine LearningApr-5-2020

We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4%, and a further 2.2% after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.

acoustic model, recognition, speech recognition, (14 more...)

arXiv.org Machine Learning

2004.04054

Country:

Africa > South Africa (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Austria > Styria > Graz (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Television (0.36)
Leisure & Entertainment (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback